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Ron M. Roth and Vitaly Skachek 



Abstract — A construction of expander codes is presented with 
the following three properties: (i) the codes lie close to the 
Singleton bound, (ii) they can be encoded in time complexity 
that is linear in their code length, and (iii) they have a linear- 
time bounded-distance decoder. By using a version of the decoder 
that corrects also erasures, the codes can replace MDS outer 
codes in concatenated constructions, thus resulting in linear- 
time encodable and decodable codes that approach the Zyablov 
bound or the capacity of memoryless channels. The presented 
construction improves on an earlier result by Guruswami and 
Indyk in that any rate and relative minimum distance that lies 
below the Singleton bound is attainable for a significantly smaller 
alphabet size. 

Keywords: Concatenated codes, Expander codes, Graph codes, 
Iterative decoding, Linear-time decoding, Linear-time encoding, 
MDS codes. 



I. Introduction 

In this work, we consider a family of codes that are based on 
expander graphs. The notion of graph codes was introduced by 
Tanner in [19]. Later, the explicit constructions of Ramanujan 
expander graphs due to Lubotsky, Philips, and Sarnak [8, 
Chapter 4], [13] and Margulis [15], were used by Alon et 
al. in [1] as building blocks to obtain new polynomial-time 
constructions of asymptotically good codes in the low-rate 
range (by "asymptotically good codes" we mean codes whose 
rate and relative minimum distance are both bounded away 
from zero). Expander graphs were used then by Sipser and 
Spielman in [16] to present polynomial-time constructions of 
asymptotically good codes that can be decoded in time com- 
plexity which is linear in the code length. By combining ideas 
from [1] and [16], Spielman provided in [18] an asymptotically 
good construction where both the decoding and encoding time 
complexities were linear in the code length. 

While the linear-time decoder of the Sipser-Spielman con- 
struction was guaranteed to correct a number of errors that 
is a positive fraction of the code length, that fraction was 
significantly smaller than what one could attain by bounded- 
distance decoding — namely, decoding up to half the mini- 
mum distance of the code. The guaranteed fraction of linear- 
time correctable errors was substantially improved by Zemor 
in [20]. In his analysis, Zemor considered the special (yet 
abundant) case of the Sipser-Spielman construction where 
the underlying Ramanujan graph is bipartite, and presented 
a linear-time iterative decoder where the correctable fraction 
was 1/4 of the relative minimum distance of the code. An 
additional improvement by a factor of two, which brought 
the (linear-time correctable) fraction to be essentially equal 
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to that of bounded-distance decoding, was then achieved by 
the authors of this paper in [17], where the iterative decoder of 
Zemor was enhanced through a technique akin to generalized 
minimum distance (GMD) decoding [10], [11]. 

In [12], Guruswami and Indyk used Zemor's construction 
as a building block and combined it with methods from [1], 
[3], and [4] to suggest a code construction with the following 
three properties: 

(PI) The construction is nearly-MDS: it yields for every 
designed rate R E (0,1] and sufficiently small e > 
an infinite family of codes of rate at least R over an 
alphabet of size 

2 0((log(l/ e ))/(fle 4 )) ) (1) 

and the relative minimum distance of the codes is greater 
than 

1-R-e. 

(P2) The construction is linear-time encodable, and the time 
complexity per symbol is POLY(l/e) (i.e., this complex- 
ity grows poly normally with 1/e). 

(P3) The construction has a linear- time decoder which is 
essentially a bounded-distance decoder: the correctable 
number of errors is at least a fraction (1— R— e)/2 of 
the code length. The time complexity per symbol of the 
decoder is also POLY(l/e). 

In fact, the decoder described by Guruswami and Indyk in [12] 
is more general in that it can handle a combination of errors 
and erasures. Thus, by using their codes as an outer code in a 
concatenated construction, one obtains a linear-time encodable 
code that attains the Zyablov bound [9, p. 1949], with a linear- 
time bounded-distance decoder. Alternatively, such a con- 
catenated construction approaches the capacity of any given 
memoryless channel: if the inner code is taken to have the 
smallest decoding error exponent, then the overall decoding 
error probability behaves like Forney's error exponent [10], 
[11] (the time complexity of searching for the inner code, 
in turn, depends on e, yet not on the overall length of the 
concatenated code). 

Codes with similar attributes, both with respect to the 
Zyablov bound and to the capacity of memoryless channels, 
were presented also by Barg and Zemor in a sequence of pa- 
pers [5], [6], [7] (yet in their constructions, only the decoding 
is guaranteed to be linear-time). 

In this work, we present a family of codes which improves 
on the Guruswami-Indyk construction. Specifically, our codes 
will satisfy properties (P1)-(P3), except that the alphabet size 
in property (PI) will now be only 

2 0((log(l/e))/ e 3 ) _ (2) 
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The basic ingredients of our construction are similar to those 
used in [12] (and also in [3] and [4]), yet their layout (in par- 
ticular, the order of application of the various building blocks), 
and the choice of parameters will be different. Our presentation 
will be split into two parts. We first describe in Section a 
construction that satisfies only the two properties (PI) and (P3) 
over an alphabet of size These two properties will be 
proved in Sections [TO] and IIVI We also show that the codes 
studied by Barg and Zemor in [5] and [7] can be seen as 
concatenated codes, with our codes serving as the outer codes. 

The second part of our presentation consists of Section [V] 
where we modify the construction of Section |H] and use the 
resulting code as a building block in a second construction, 
which satisfies property (P2) as well. 

II. Construction of linear-time decodable codes 

Let Q — (V' : V",E) be a bipartite A-regular undirected 
connected graph with a vertex set V = V U V" such that 
V n V" = 0, and an edge set E such that every edge in E 
has one endpoint in V' and one endpoint in V". We denote the 
size of V by n (clearly, n is also the size of V") and we will 
assume hereafter without any practical loss of generality that 
n > 1. For every vertex u € V, we denote by E(u) the set of 
edges that are incident with u. We assume an ordering on V, 
thereby inducing an ordering on the edges of E(u) for every 
u G V. For an alphabet F and a word z — {z e ) e£ E (whose 
entries are indexed by E) in F> E >, we denote by {z) E{u) the 
sub-block of z that is indexed by E(u). 

Let F be the field GF(q) and let C and C" be linear 
[A, rA, OA] and [A, RA, SA] codes over F, respectively. We 
define the code C = (G,C : C") as the following linear code 
of length \E\ over F: 

C = jc G : (c) BW G C for every u E V 

and (c) E („) G C" for every we V"} 

(C is the primary code considered by Barg and Zemor in [5]). 

Let $ be the alphabet F rA . Fix some linear one-to-one 
mapping £ ; $ — > C over F, and let the mapping ipg '• C — > 
<I> n be given by 



^ f (c) = (f- 1 ((c) B( „ ) )) 



c e C 



(3) 



That is, the entries of ips(c) are indexed by V', and the entry 
that is indexed by u G V equals £ _1 ((c) E(u) ). We now define 
the code (C)$ of length n over $ by 

(C)* = {Vfi(c) : ceC} . 

Every codeword cc = (x u ) ue v' of (C)$ (with entries x u in 
$) is associated with a unique codeword c G C such that 

£ (x u ) = (c) BW , uSK'. 

Based on the definition of (C)$, the code C can be repre- 
sented as a concatenated code with an inner code C over F 
and an outer code (C)$ over $. It is possible, however, to use 
(C)$ as an outer code with inner codes other than C Along 
these lines, the codes studied in [5] and [7] can be represented 
as concatenated codes with (C)$ as an outer code, whereas 
the inner codes are taken over a sub-field of F. 



III. Bounds on the code parameters 

Let C = (G,C : C"), $, and (C)$ be as defined in 
Section |ll| It was shown in [5] that the rate of C is at least 
r + R — 1. From the fact that C is a concatenated code with 
an inner code C and an outer code (C)$, it follows that the 
rate of (C)$ is bounded from below by 



R-l 1 R 
= 1- - + - 

r r r 



(4) 



In particular, the rate approaches R when r — * 1. 

We next turn to computing a lower bound on the relative 
minimum distance of (C)$. By applying this lower bound, we 
will then verify that (C)$ satisfies property (PI). Our analysis 
is based on that in [7], and we obtain here an improvement 
over a bound that can be inferred from [7]; we will need that 
improvement to get the reduction of the alphabet size from Q 
to l|2}. We first introduce several notations. 

Denote by Ag the adjacency matrix of Q; namely, Ag, is a 
|V| x \V\ real symmetric matrix whose rows and columns are 
indexed by the set V, and for every u, v G V, the entry in Ag 
that is indexed by (it, v) is given by 

1 if {u,v}eE 
otherwise 



It is known that A is the largest eigenvalue of Ag . We denote 
by 75 the ratio between the second largest eigenvalue of 
Ag and A (this ratio is less than 1 when Q is connected 
and is nonnegative when n > 1; see [8, Propositions 1.1.2 
and 1.1.4]). 

When Q is taken from a sequence of Ramanujan expander 
graphs with constant degree A, such as the LPS graphs in [13], 
we have 
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< 



2VA^T 



For a nonempty subset S of the vertex set V of G, we will 
use the notation C?s to stand for the subgraph of G that is 
induced by S: the vertex set of Gs is given by S, and its 
edge set consists of all the edges in G that have each of their 
endpoints in S. The degree of u in Gs, which is the number 
of adjacent vertices to u in Gs, will be denoted by deg s (w). 

Theorem 3.1: The relative minimum distance of the code 
(C)$ is bounded from below by 



S-lgy/6/0 
1-70 

In particular, this lower bound approaches 6 when jg — * 0. 



The proof of the theorem will make use of Proposition [33] 
below, which is an improvement on Corollary 9.2.5 in Alon 
and Spencer [2] for bipartite graphs, and is also an improve- 
ment on Lemma 4 in Zemor [20]. We will need the following 
technical lemma for that proposition. The proof of this lemma 
can be found in Appendix lAl 

Denote by JV(u) the set of vertices that are adjacent to 
vertex u in G- 

Lemma 3.2: Let \ be a real function on the vertices of G 
where the images of \ are restricted to the interval [0,1]. Write 



n — ' 



and 



77 ^ ^ 
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Then 
1 

An 



7^7 ^2 ^2 X(u)x(v) < <JT + 7 e - v /<T(l-cr)r(l-r) 



(Comparing to the results in [20], Lemma 4 therein is stated 
for the special case where the images of \ are either or 1. 
Our first inequality in Lemma 13.21 yields a bound which is 
always at least as tight as Lemma 4 in [20].) 

Proposition 3.3: Let S C V' and T C V" be subsets of 
sizes \S\ = an and \T\ = rn, respectively, such that a+r > 0. 
Then the sum of the degrees in the graph Gsut is bounded 
from above by 

de §suT( u ) < 2 ((l-7e)^r + lg^faT) An . 

mGSUT 

Proof: We select x( u ) m Lemma l3~2l to be 



On the one hand, by Lemma I3T2 



1 if u G S U T 
otherwise 



X! X! X(")x(w) < ((l-7<?)c r T + 7e\/CTT) Ar 



On the other hand, 

2 X X(«)x(w) = J! de §suT( u ) ■ 

u£V v£Af(u) ueSUT 

These two equations yield the desired result. □ 

Proof of Theorem \3.1\ First, it is easy to see that (C)$ 
is a linear subspace over F and, as such, it is an Abelian 
subgroup of $ n . Thus, the minimum distance of (C)$ equals 
the minimum weight (over $) of any nonzero codeword of 
(C)». 

Pick any nonzero codeword x 6 (C)$, and let c = (c e ) e£ E 
be the unique codeword in C such that x = fa (c). Denote by 
Y C E the support of c (over F), i.e., 

r={eeE:c^0}. 

Let S (respectively, T) be the set of all vertices in V (re- 
spectively, V") that are endpoints of edges in Y. In particular, 
S is the support of the codeword x. Let a and r denote the 
ratios \ S\/n and \T\/n, respectively, and consider the subgraph 
Q(Y) = (S :T, Y) of Q. Since the minimum distance of C is 
9 A, the degree in of every vertex in V' is at least 9 A. 

Therefore, the number of edges in Q{Y) satisfies 

\Y\ > 9 A ■ an . 

Similarly, the degree in Q{Y) of every vertex in V" is at least 
5 A and, thus, 

\Y\ > SA-rn. 



Therefore, 



On the other hand, G(Y) is a subgraph of Gsut\ hence, by 
Proposition 13.31 

1 de SsuT( u ) < ((l-7s)en- + 7s^ 



\Y\ < 



An 



Combining the last two equations yields 

max{0<7, St} < (l—~fg)aT + r )g\fa~r 

We now distinguish between two cases. 
Case 1: cr/r < 8/9. Here (|5} becomes 



(5) 



and, so, 



5t < (1— 7e)crr + ^g\faT 



<5-7eV cr / r ^ 6 
a > > 



(6) 

1 - 75 1 - 7S 

Case 2: cr/r > 5/9. By exchanging between a and r and 



between 9 and <5 in 0, we get 

7G- 



t > 



i 



75 



Therefore, 



5 5 
a > - ■ t > - ■ 



■igy/Bfl _ S--fg^/S~j9 



o v 1 — 7S 1 — 7s 

Either case yields the desired lower bound on the size, an, 
of the support S of x. □ 

The next example demonstrates how the parameters of (C)$ 
can be tuned so that the improvement (|2ji of property (PI) 
holds. 

Example 3.1: Fix 9 = e for some small e 6 (0, 1] (in which 
case r > 1 — e), and then select q and A so that q > A > 4/e 3 . 
For such parameters, we can take C and C" to be generalized 
Reed-Solomon (GRS) codes over F. We also assume that Q 
is a Ramanujan bipartite graph, in which case 



7S 



< ^_ < e 3/ 2 



By (|4}, the rate of (C)$ is bounded from below by 
1 R 



1 



> R 



l-e 1-e 

and by Theorem 13. II the relative minimum distance is at least 



S-j g y5/9 
1 -7S 



> S--fg^/S/9 > 6 
= S-e>l-R-e. 



,3/2 . J_ 



|y| > max{ 6»ct,^t} ■ An 



Thus, the code (C)$ approaches the Singleton bound when 
e — > 0. In addition, if g and A are selected to be (no larger 
than) 0(l/e 3 ), then the alphabet $ has size 

|*| = q rA - 2°( (log(1/e))/£3 ) . 

□ 

From Example 13. II we can state the following corollary. 

Corollary 3.4: For any designed rate R £ (0, 1] and suffi- 
ciently small e > there is an infinite family of codes (C)$ 
of rate at least R and relative minimum distance greater than 
1 — R — e, over an alphabet of size as in (|2j. 
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IV. Decoding algorithm 

Let C = (Q,C : C") be defined over F = GF(q) as in 
Section [H] Figure [2 presents an adaptation of the iterative 
decoder of Sipser and Spielman [16] and Zemor [20] to the 
code (C)$, with the additional feature of handling erasures 
(as well as errors over $): as we show in Theorem l4. ll below. 
the algorithm corrects any pattern of t errors and p erasures, 
provided that t + (p/2) < fin, where 



= 



(6/2) 



70 



Note that (3 equals approximately half the lower bound in 
Theorem 13.11 The value of v in the algorithm, which is 
specified in Theorem 14.11 below, grows logarithmically with 
n. 

We use the notation "?" to stand for an erasure. The 
algorithm in Figure ^ makes use of a word z = (z e ) eG E 
over F U {?} that is initialized according to the contents of 
the received word y as follows. Each sub-block (z) E{u) that 
corresponds to a non-erased entry y u of y is initialized to 
the codeword £{y u ) of C. The remaining sub-blocks (z) E(u) 
are initialized as erased words of length A. Iterations i = 
3, 5, 7, . . . use an error-correcting decoder V' : F A — > C that 
recovers correctly any pattern of less than 9 A/2 errors (over 
F), and iterations i = 2, 4, 6, . . . use a combined error-erasure 
decoder V" : (F U {?}) A -> C" that recovers correctly any 
pattern of a errors and b erasures, provided that 2a + b < 5 A 
(b will be positive only when i = 2). 

Theorem 4.1: Suppose that 



Ves > 2 7e > o , 

and fix a to be a positive real number such that 

(6/2) -jgy/5/6 



a < P 



If 



v = 2 



log 



1-76 



(7) 



(8) 



(3-a 

then the decoder in Figure [2 recovers correctly any pattern of 
t errors (over <£>) and p erasures, provided that 



t + — < an 



(9) 



The proof of the theorem makes use of the following lemma. 

Lemma 4.2: Let x> a, and r be as in Lemma 13.21 and 
suppose that the restriction of \ t° V" is not identically zero 
and that -fg > 0. Let 6 be a real number for which the 
following condition is satisfied for every v E V": 

, \ „ / \ SA 

x(v) > o 2^ x(«) > — • 



Then 



> 



((5/2) - (1-1q)o- 



1G 



The proof of Lemma l4~2l can be found in Appendix|X| This 
lemma implies an upper bound on r, in terms of a; it can be 



verified that this bound is always at least as tight as Lemma 5 
in [20]. 

Proof of Theorem \4.1\ For i > 2, let Ui be the value of the 
set U at the end of iteration i in Figure [J and let Si be the 
set of all vertices u E Ui such that (z) e( u ) is m error at the 
end of that iteration. Let \i '■ {V U V") — * {0, |, 1} be the 
function 

1 if it G V' and y u is in error 
X\(u) = ^ I if u e V' and y u is an erasure , 
otherwise 



and, for i > 2 define the function Xi ■ (V U V") 
recursively by 

1 if u E Si 

Xi(u) = { if tie C/A^i 

if u e Ui-x 



{o,ii} 



where f/i = V. 
Denote 



■E 



Xi(u) 



Obviously, a\n = t + (p/2) and, so, by (|9jl we have <j\ < a. 

Let i be the smallest positive integer (possibly oo) such 
that oi = 0. Since both V and D" are bounded-distance 
decoders, a vertex u S Ui can belong to for even i > 
2, only if the sum J2ueN(v) Xi( u ) (which equals the sum 
YlueNM Xi-i( u )) is at least 6A/2. Similarly, a vertex v E Ui 
belongs to Si for odd i > 1, only if J2ueAf{v) Xi( u ) > OA/2. 
It follows that the function Xi satisfies the conditions of 
Lemma l4~2l (with 9 taken instead of 6 for odd i) and, so, 



> 



5 

*yg 
e 

275 



l-7g 

1Q 

1-78 

70 



-Ui-i for even < i < 



-<Xj_i for odd 1 < i < £ 



(10) 

Using the condition o\ < a < j3, it can be verified by 
induction on i > 2 that 



> 



a, 



5/0 
9/5 



Hence, for every i > 2, 



cr,: 



for even < i < £ 
for odd 1 < i < £ 



"-9-5- 1 



(ID 



in particular, Ui < cr for odd i and er,; < CT2 for even i. 
Incorporating these inequalities into dl 01 yields 



> 



5 



-1Q 



and 



> 



270^ 



JQ 



-~1G 
19 



\fa for even < % < I 



(12) 



for odd 1 < i < £ 



By combining H2i and d 1 3I > we get that for even i > 0, 

2 7g 2(l- 7g ) 1 

1 V 179 — — 

0^/cTi+i 9 ' ^fal 

8 l-7g / 



(13) 



> 



276V°"*-i 



75 



5 



Input: Received word y — {y u )ueV m U {?})"■ 
Initialize: For u G V do: 0) EW «- | ?} Vu \ f f V y u ^ * ■ 
Iterate: For i = 2, 3, . . . , v do: 

(a) If % is odd then {/ = V and 2? = V, else 17 = V" and V = V". 

(b) For every u £ U do: (z) E ( u) T> ((z) E(u) ). 
Output: ips{z) if z 6 C (and declare 'error' otherwise). 



Fig. 1. Decoder for (C)$. 



or 



i+1 



> 



> 



6*5 


1-7S 




7s 


95 


1-7S 




7s 




47g VV^T 

where the second inequality follows from er 2 < a ■ 9/5 
(see (II 1». and the (last) equality follows from the next chain 
of equalities: 




(5/2)- lgy /5/9 

Consider the following first-order linear recurring sequence 
(Aj)j>o that satisfies 



A 



•3+1 



4 7 g I A ^ 



/3 



/3 ' 




where A = \j \fa. From (fl4l we have l/y'o'i+i > Aj/2 for 
even i > 0. By solving the recurrence for (A 3 ), we obtain 



(15) 



From the condition @ we thus get that <7j_|_i decreases 
exponentially with (even) i. A sufficient condition for ending 
the decoding correctly after v iterations is having ct„ < 1/n, 
or 

1 n 



We require therefore that v be such that 



1 



> 



All 



("-l)/2 



1 - 




> Vn ■ 



The latter inequality can be rewritten as 



9S_ 

thus yielding 



{v-l)/2 



> 



I no — (cr/(3) (3y/na — a 



v > 2 log 



1 - (a/P) 
P^/na — a 



0-a 



1 



where the base of the logarithm equals (9S) / (4-fg) . In sum- 
mary, the decoding will end with the correct codeword after 



v = 2 



log 



/ Py/na — a 



P-a 



equals 
□ 



iterations (where the base of the logarithm again 

(^)/(4 7 |).) 

In Lemma IB. II which appears in Appendix 151 it is shown 
that the number of actual applications of the decoders V and 
V" in the algorithm in Figure [2 can be bounded from above 
by ui ■ n, where 



i ( 06 



1 + ^ 



4t| 
95 



Thus, if 9 and S are fixed and the ratio a//3 is bounded away 
from 1 and Q is a Ramanujan graph, then the value of ui 
is bounded from above by an absolute constant (independent 
of A). 

The algorithm in Figure ^ allows us to use GMD decoding 
in cases where (C)$ is used as an outer code in a concatenated 
code. In such a concatenated code, the size of the inner 
code is |<f>| and, thus, it does not grow with the length n of 
(C)$. A GMD decoder will apply the algorithm in Figure ^ a 
number of times that is proportional to the minimum distance 
of the inner code. Thus, if the inner code has rate that is 
bounded away from zero, then the GMD decoder will have 
time complexity that grows linearly with the overall code 
length. Furthermore, if C , C", and the inner code are codes 
that have a polynomial-time bounded-distance decoder — e.g., 
if they are GRS codes — then the multiplying constant in the 
linear expression of the time complexity (when measured in 
operations in F) is Poly(A). For the choice of parameters 
in Example 13.11 this constant is POLY(l/e) and, since F is 
chosen in that example to have size 0(l/e 3 ), each operation 
in F can in turn be implemented by POLY(log(l/e)) bit 
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operations. (We remark that in all our complexity estimates, 
we assume that the graph Q is "hard-wired" so that we can 
ignore the complexity of figuring out the set of incident edges 
of a given vertex in Q. Along these lines, we assume that each 
access to an entry takes constant time, even though the length 
of the index of that entry may grow logarithmically with the 
code length. See the discussion in [16, Section II].) 

When the inner code is taken as C, the concatenation results 
in the code C = (Q,C : C") (of length An) over F, and the 
(linear-time) correctable fraction of errors is then the product 
9 ■ a, for any positive real a that satisfies (JSJl. A special case 
of this result, for F = GF(2) and C = C", was presented in 
our earlier work [17], yet the analysis therein was different. 
A linear-time decoder for C was also presented by Barg and 
Zemor in [7], except that their decoder requires finding a 
codeword that minimizes some weighted distance function, 
and we are unaware of a method that performs this task in 
time complexity that is Poly(A) — even when C and C" have 
a polynomial-time bounded-distance decoder. 

V. Construction which is also linear-time 

ENCODABLE 

In this section, we use the construction (C)$ of Section ITT1 
as a building block in obtaining a second construction, which 
satisfies all properties (P1)-(P3) over an alphabet whose size 
is given by 0. 

A. Outline of the construction 

Let C = (g,C : C") be defined over F = GF(q) as 
in Section ||T| The first simple observation that provides the 
intuition behind the upcoming construction is that the encoding 
of C, and hence of (C)$, can be easily implemented in linear 
time if the code C has rate r = 1, in which case $ = F A . 
The definition of C then reduces to 

C = |c £ : (c) B( „, £ C" for every v £ F"} . 

We can implement an encoder of C as follows. Let £" : 
F RA — > C" be some one-to-one encoding mapping of C" . 
Given an information word r] in F RAn t it is first recast into a 
word of length n over F RA by sub-dividing it into sub-blocks 
rj v € F RA that are indexed by v £ V"; then a codeword 
c e C is computed by 

(c) BW = £"(ri v ) , veV". 

By selecting £ in (0 as the identity mapping, we get that the 
respective codeword x = (x u ) ue v' — ">Ps( c ) m (Q* is 

x u = (c)e(u) , ueV . 

Thus, each of the A entries (over F) of the sub-block x u can 
be associated with a vertex v £ AT(u), and the value assigned 
to that entry is equal to one of the entries in £"(r] v ). 

While having C = $ (= F A ) allows easy encoding, the 
minimum distance of the resulting code (C)$ is obviously 
poor. To resolve this problem, we insert into the construction 
another linear [A, r A, 9 A] code C over F. Let H be some 
((1— ro)A) x A parity-check matrix of Cq and for a vector 



h e F (i-r )A f denote by Co (^) me f u ow i n g C oset of C 
within $: 

C (h) : H Q v = h} . 

Fix now a list of vectors s = {h u ) ue v' where h u £ F^~ r °' , 
and define the subset C(s) of C by 

C(s) = {c € C : (c) B(u) e C (h u ) for every u £ V'} ; 

accordingly, define the subset (C(s))$ of (C)$ by 

(C(*))» = {Mc) = ((c) BM ) ueV , : c e C(a)} . 

Now, if s is all-zero, then C(s) coincides with the code C(0) = 
(Q, Cq : C"); otherwise, C(s) is either empty or is a coset of 
C(0), where C(0) is regarded as a linear subspace of C over 
F, From this observation we conclude that the lower bound in 
Theorem 13 . 1 1 applies to any nonempty subset (C(s))$, except 
that we need to replace 9 by 9q. 

In addition, a simple modification in the algorithm in 
Figure[Oadapts it to decode (C(s))$ so that Theorem l4. 1 I holds 
(again under the change 9 <-» 6>o) : during odd iterations i, we 
apply to each sub-block (z) E(u) a bounded-distance decoder 
of Co(h u ), instead of the decoder V . 

Therefore, our strategy in designing the linear-time encod- 
able codes will be as follows. The raw data will first be 
encoded into a codeword c of C (where C — $). Then we 
compute the n vectors 

h u = H ■ (c) E(u) , u £ V' , 

and produce the list s = (h u ) u£ v r , clearly, c belongs to C(s). 
The list s will then undergo additional encoding stages, and 
the result will be merged with ipe(c) to produce the final 
codeword. The parameters of Co, which determine the size 
of s, will be chosen so that the overhead due to s will be 
negligible. 

During decoding, s will be recovered first, and then we will 
apply the aforementioned adaptation to (C(s))$ of the decoder 
in Figure ^ to reconstruct the information word tj. 

B. Details of the construction 

We now describe the construction in more detail. We let F 
be the field GF(g) and Ai and A2 be positive integers. The 
construction makes use of two bipartite regular graphs, 

Gi = (V : V", £1) and Q 2 = (V' : V", E 2 ) , 

of degrees Ai and A2, respectively. Both graphs have the 
same number of vertices; in fact, we are making a stronger 
assumption whereby both graphs are defined over the same 
set of vertices. We denote by n the size of V (or V") and 
by $1 and $2 the alphabets F Al and F A2 , respectively. The 
notations Ei(u) and Ei(u) will stand for the sets of edges 
that are incident with a vertex u in Qi and Q 2 , respectively. 

We also assume that we have at our disposal the following 
four codes: 

• a linear [A 1: roA 1; #0^1] code Co over F; 
m a linear [A 1; i?iA 1; ^iAjJ code C\ over F; 
m a linear [A2, -R2A2, ^A2] code C 2 over F; 
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> a code C m of length n and rate r m over the alphabet 
<f> m =F R * A *. 
The rates of these codes need to satisfy the relation 

(l-ro)Ai = r m i? 2 A 2 , 

and the code C m is assumed to have the following properties: 

1) Its rate is bounded away from zero: there is a universal 
positive constant k such that r m > k. 

2) C m is linear-time encodable, and the encoding time per 
symbol is POLY(log |4 m |)- 

3) C m has a decoder that recovers in linear- time any pattern 
of up to pin errors (over the alphabet <l> m ), where p is 
a universal positive constant. The time complexity per 
symbol of the decoder is POLY(log |4 m |). 

(By a universal constant we mean a value that does not depend 
on any other parameter, not even on the size of 4 m .) For 
example, we can select as C m the code of Spielman in [18], 
in which case k can be taken as 1/4. 

Based on these ingredients, we introduce the codes 



d = :Ci) 



and 



C 2 = (£ 2 ,$ 2 :C 2 ) 



over F. The code Ci will play the role of the code C as 
outlined in Section IV-AI whereas the codes C m and C 2 will 
be utilized for the encoding of the list s that was described 
there. 

The overall construction, which we denote by C, is now 
defined as the set of all words of length n over the alphabet 

$ = $x x $ 2 

that are obtained by applying the encoding algorithm in 
Figure 13 to information words t] of length n over F RlAl . 
A schematic diagram of the algorithm is shown in Figure [5] 
(In this algorithm, we use a notational convention whereby 
entries of information words r\ are indexed by V", and so are 
codewords of C m .) 

From the discussion in Section IV-AI and from the assump- 
tion on the code C m it readily follows that the encoder in 
Figure |3 can be implemented in linear time, where the encod- 
ing complexity per symbol (when measured in operations in 
F) is Poly(Ai, A 2 ). The rate of C is also easy to compute: 
the encoder in Figure [2] maps, in a one-to-one manner, an 
information word of length n over an alphabet of size q RlAl , 
into a codeword of length n over an alphabet $ of size 
q Al+A2 . Thus, the rate of C is 

giAin = fli 
(Ai + A 2 )n 1 + (A 2 /Ai) ' ( ' 

In the next section, we show how the parameters of C can be 
selected so that it becomes nearly-MDS and also linear-time 
decodable. 

C. Design, decoding, and analysis 

We will select the parameters of C quite similarly to 
Example l3.il We assume that the rates Ri and R 2 of C\ and 
C 2 are the same and are equal to some prescribed value R, 
and define 

a R = 8 • (1--R) ■ max{R/fi, 2/k} 



(notice that an can be bounded from above by a universal 
constant that does not depend on R, e.g., by 16/ mm{2p, k}). 
We set 6*o = n ■ e for some positive e < R (in which case 
1—ro < k ■ e), and then select q, Ai, and A 2 so that q > 

Ai > aii/e 3 and 



(l-T-o)A! 

r m R 



(< Ai 



(17) 



yet we also assume that q is (no larger than) 0(l/e 3 ). The 
graphs Qi and Q 2 are taken as Ramanujan graphs and Co, 
C\, and C 2 are taken as GRS codes over F. (Requiring 
that both Ai and A 2 be valid degrees of Ramanujan graphs 
imposes some restrictions on the value (1— ro)/(r m R). These 
restrictions can be satisfied by tuning the precise rate of C m 
last.) 

Given this choice of parameters, we obtain from dl7l that 
A 2 /Ai < e/R and, so, the rate (I16> of C is greater than 



R 



> R-e. 



(18) 



1 + (e/R) 
The alphabet size of C is 

|$| = 14x1 • |4 2 | = q A i+ A i = 2 °(( 1 °s( 1 A))A 3 ) t 

as in l|2j, where we have absorbed into the O(-) term the 
constants k and /i. 

Our next step in the analysis of the code C consists of 
showing that there exists a linear-time decoder which recovers 
correctly any pattern of t errors and p erasures, provided that 



2t + p < (l-R-e)n 



(19) 



This, in turn, will also imply that the relative minimum dis- 
tance of C is greater than 1—R—e, thus establishing with i ll 8b 
the fact that C is nearly-MDS. 

Let x = (x u ) u <zv be the transmitted codeword of C, where 

x u = ((c) El( „), (d)s a ( u )) j 

and let y = (y u ) u eV be the received word; each entry y u 
takes the form {y Uil ,y Ui2 ), where y u l e $iU{?} and y u 2 £ 
4 2 U{?}. Consider the application of the algorithm in Figure^ 
to y, assuming that y contains t errors and p erasures, where 
2t + p< (l-R-e)n. 

Step (Dl) is the counterpart of the initialization step in 
Figure ^ (the entries of z here are indexed by the edges of 

02). 

The role of Step (D2) is to compute a word w E that 
is close to the codeword w of C m , which was generated in 
Step (E3) of Figure|2] Step (D2) uses the inverse of the encoder 
£ 2 (which was used in Step (E4)) and also a combined error- 
erasure decoder V 2 : (FU{?}) A2 -> C 2 that recovers correctly 
any pattern of a errors (over F) and b erasures, provided that 
2a + b < S 2 A 2 . The next lemma provides an upper bound on 
the Hamming distance between w and w (as words of length 
n over 4 m ). 

Lemma 5.1: Under the assumption dl9l >. the Hamming dis- 
tance between w and w (as words over $ m ) is less than p,n. 



8 



Input: Information word r/ = {rj v ) ve v" of length n over F RlAl . 
(El) Using an encoder £\ : F RlAl — » Ci, map into a codeword c of Ci by 

(c) ai ( „) <-£i (»!„), 

(E2) Fix some ((1— r )Ai) x Ai parity-check matrix i/ of Co over F, and compute the n vectors 

h u <- iJ • (c)e i(u) , u S V , 

to produce the list s = (ft„)„ e y. 
(E3) Regard s as a word of length (1— r ) Ain (= r m i? 2 A 2 n) over F, and map it by an encoder of C m into a codeword 

ti? = {W V ) V€ V" Of Cm- 

(E4) Using an encoder £2 : F R2A2 — > C2, map to into a codeword d of C2 by 

(d) BaW <- £ 2 (to„) , «GU". 
Output: Word a; = (a; u )„ e v/ in ($1 x $2)™ whose components are given by the pairs 

x u = {{c) El( u), id) E2(u) ) , u e V . 

Fig. 2. Encoder for C. 



Step (El) \ Step (E2) i Step (E3) j Step (E4) 

Graph Q\ Graph Q2 




Fig. 3. Schematic diagram of the encoder for C. 
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Input: Received word y = (y u ) u£ v' in (<& U {?}) r 



(Dl) For ueV do: {z) E2M 



Vu,2 

?? ? 



if Vu.2 e *i 

if Vu,2 = ? 



(D2) Forney" do: w v «- £^ (Z> 2 ((*)*»(«)))• 

(D3) Apply a decoder of C m to if? = (/U7„)„ e y// to produce an information word s 6 ^7 , (i-'o)Airi 

(D4) Apply a decoder for (Ci(s))$ 1 to (y M 1 ) tl6 y, as described in Section IV- Al to produce an information word 

Output: Information word r) = (fj v ) v eV" of length n over F RAl . 

Fig. 4. Decoder for (C)$. 



Proo/- Define the function x ■ {V'UV") 
1 



{0,i,l}by 



if u € V and y u 2 is in error 
| if u € V and y n 2 is an erasure 
1 if u € V" and w u ^ to u 
otherwise 



Assuming that w ^ w, this function satisfies the conditions 
of Lemma PT2l with respect to the graph Q2, where an equals 
t + (p/2) and rn equals the number of vertices v € V" such 
that w v 7^ tu^. By that lemma we get 



> 



> 



(S2/2) 



'12 )0 



> 



(S2/2) 



72 

l-R- 2a 



72 



> 



272 272 

where 72 stands for jg 2 and the last inequality follows 
from ( I19> . Now, by (I17> we have 

i(l-R) 



A 2 = 



(l-ro)Ai eAi 

r m i? i? - R- e 2 



> 



A 4 ' e 



from which we get the following upper bound on the square 
of 72: 

-1) 



7l < 4(A2 



< A 2 " 2(l-i2) 



A 2 
^2 



Combining this bound with J20b yields 



> 



t 2^ ' 

namely, r < 2fia/(l-R) < fx. □ 

It follows from Lemma 15.11 that Step (D2) reduces the 
number of errors in w to the extent that allows a linear-time 
decoder of C m to fully recover the errors in w in Step (D3). 
Hence, the list s, which is computed in Step (D3), is identical 
with the list s that was originally encoded in Step (E2). 

Finally, to show that Step (D4) yields complete recovery 
from errors, we apply Theorem 14. II to the parameters of the 
code Co : C\). Here 9o = n ■ e and 

£ 3/2 

7i = 70i < 



2e 3/2 
< < 



fcm 2y/(l-R)/K 



therefore, 

(<5i/2)- 7lv /V^ 







I-71 



> 



7i 



-R 1- 
— > — 



R-e 



and, so, by dl9l >. the conditions of Theorem 14. 1 1 hold for a = 
(l-R-e)/2 (note that (3 > yields y/(Wh > 2 7i , thus 
holds). 

Appendix A 

We provide here the proofs of Lemmas 13.21 and 14.21 
Given a bipartite graph Q = (V' : V" , E), we associate 
with Q a I V' \ x \V"\ real matrix Xg whose rows and columns 
are indexed by V' and V", respectively, and (Xg) UyV = 1 if 
and only if {it, v} E E. With a proper ordering on V U V", 
the matrix Xg is related to the adjacency matrix of Q by 



Ac = 






Xg 








(21) 



Lemma A.1: Let Q = (V : V", E) be a bipartite A-regular 
graph where \V'\ > 1. Then A 2 is the largest eigenvalue of 
the (symmetric) matrix XgXg and the all-one vector 1 is a 
corresponding eigenvector. The second largest eigenvalue of 
X T G Xg is 7 |A 2 . 

Proof: We compute the square of Ag, 



A 2 - 

Ag - 



XgXg 








X T gXg 



and recall the following two known facts: 

(i) XgXg and XgXg have the same set of eigenvalues, 
each with the same multiplicity [14, Theorem 16.2]. 

(ii) If A is an eigenvalue of Ag, then so is —A, with the same 
multiplicity [8, Proposition 1.1.4]. 

We conclude that A is an eigenvalue of Ag if and only if A 2 
is an eigenvalue XgXg; furthermore, when A 7^ 0, both these 
eigenvalues have the same multiplicities in their respective 
matrices. The result readily follows. □ 

For real column vectors x, y € M. m , let (x, y) be the scalar 
product x T y and ||a:|| be the norm */ (x, x). 

Lemma A.2: Let Q = (V : V" , E) be a bipartite A-regular 
graph where \V'\ = n > 1 and let s = (s u )uev and t = 
V be two column vectors in W 1 . Denote by a and r 
the averages 



T) ^ ^ 



and 



n — ^ 
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and let the column vectors y and z in R™ be given by 

y = s — a ■ 1 and z = t — r ■ 1 . 
Define the vector x G R 2 ™ by 

-C 

Then, 

\(x,Agx) - 2<7tAu\ < 2 7g A||y|| • ||z|| . 

Proof: First, it is easy to see that Xgl — Xgl — A • 1 and 
that (y, 1) = (z, 1) = 0; these equalities, in turn, yield the 
relationship: 

(y, X g z) = (s, Xgt) - arAn . 
Secondly, from ( 12 li we get that 

(x,Agx) = 2(s,Xgt) . 
Hence, the lemma will be proved once we show that 

|(y,Z s z)|<7 S A||y||-||z||. (22) 

Let 

Ai > A 2 > . . . > A„ 

be the eigenvalues of XgXg and let v±, V2, ■ ■ ■ ,v n be corre- 
sponding orthonormal eigenvectors where, by Lemma IX. II 



where 



and 



W} = - E - 

= \ E MuY) , 

u£V" 

Var' e {x}-Eax 2 }-(E' g {x}) 2 , 
Var^{ X } = E^{ X 2 }-(Eax}) 2 . 



Ai = A 2 
Write 



A 2 = 7 ^A 2 



and v\ = (1/ \fn) ■ 1 



Proof: Define the column vectors 

s = (x( u ))uev , t = (x(u))uev>- 

and 

■-(:' 

and denote by a and r the averages 

o = - E Su and t = ~ E tu 

u£V u£V" 

The following equalities are easily verified: 

(x 1 Agx) 

Eg{x} = a , Eg{x} = r , 
Var^{x} = --|l^-^-l|| 2 , 



E fo Vi ' 



and 



where /3j = (z,Vi). Recall, however, that /?x = (l/y/n) 
(z, 1) = 0; so, 



Xcz\\ 2 



(z,X%Xgz) 



,E E A <#«i) = E A ^ 2 n^ 

i=2 i=2 «=2 

n 

< a 2 EA 2 = a 2 ||z|| 2 = 73a 2 ||*|| 2 . 



The desired result ( 1221 is now obtained from the Cauchy- 
Schwartz inequality. □ 

Lemma A. 3: Let Q — (V : V", E) be a bipartite A-regular 
graph where \V'\ = n > 1 and let x ■ (V U V") — >• E be a 
function on the vertices of Q. Define the function w : E — ► K 
and the average Eg{ui} by 



Var^{ X } = -.p-r.l|| 2 . 

n 

The result now follows from Lemma IX. 21 □ 

Proof of Lemma \3.2\ Using the notation of Lemma IA.3I 
write 

EeW = ^E E X(u)x(v), (23) 
uev ueAT(«) 

Eg{x} = i E ^(") = a ' (24) 



and 



Eg{x} = ^ E X(«)=r 



(25) 



;(e) = x( u )x( u ) f° r every edge e = {u,v} in <? 



and 



eEE 



Since the range of x is restricted to the interval [0, 1], we have 
E^k 2 } < E'g{x} and E^{ x 2 } < E^{x} ; 

hence, the values Var^{x} and Var^jx} can be bounded from 
above by 

Var' e { X } < o - a 2 and Var'^{x} < r - r 2 . (26) 
Substituting (1231 — d26i into Lemma IX. 31 yields 



Then 



Eg{w} - E'g{ X } • E£{ X } < 7 WVar' e {x} ■ Var'^x} , 



A^(Y1 E X(u)x(v))-<JT 



< 7ev /cr ( 1 - cr ) T ( 1 - T ) ; 
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so, 



^ E E X(u)x(v) 



< ot + 7g a/o-(1— o-)t(1— r) 

= (l-7e)o-r + jgy/ar Ct/ot + \J (l-a)(l-r)j 

< (1— 7g)<7T + rygy/crr , 

as claimed. □ 

Proof of Lemma 14. 2\ We compute lower and upper bounds 
on the average 

^ E E • 

u£V" u£j\f(v) 

On the one hand, this average equals 
1 \ -> x - 1 <5A x ■> 5t 

^ E E ^ ' — E = T ' 



Dgy": 

x(«)>o 



An 



wGV" 



>SA/2 ttl 

where the inequality follows from the assumed conditions on 
X- On the other hand, this average also equals 

E E X(u)x(v) < (l-Jg)<JT + jgy/aT , 

where the inequality follows from Lemma 13.21 Combining 
these two bounds we get 

St . . — 

— < (l-^g)aT + Jgy/VT , 

and the result is now obtained by dividing by -fg r and re- 
arranging terms. □ 

Appendix B 

When analyzing the complexity of the algorithm in Figure 
one can notice that the decoder T> £ {D',T)"} needs to be 
applied at vertex u, only if (z) e( u ) has been modified since the 
last application of T> at that vertex. Based on this observation, 
we prove the following lemma. 

Lemma B.l: The number of (actual) applications of the 
decoders V and T>" in the algorithm in Figure ^ can be 
bounded from above by u> ■ n, where 



u) = 2- 



log 



1 + ^ 



1 - 



65 



Proof: Define £y by 



i T = 2 • 



It is easy to verify that 



65 



ir/2 



(27) 



In the first ij- iterations in Figure ^ we apply the decoder T> 
(which is either V or D") at most ix ■ n times. 

Next, we evaluate the total number of applications of the 
decoder T> in iterations i = ix + 1, *t + 2, • • • , v. We hereafter 
use the notations E/j and Si as in the proof of Theorem 14. II 
Recall that we need to apply the decoder T> to (z)e( u ) for 
a vertex u £ Ui+i> only if at least one entry in (z)e( u ) — 
say, the one that is indexed by the edge {u,v} £ E(u) — 
has been altered during iteration Such an alteration may 
occur only if v is a vertex in t/j+i with an adjacent vertex in 
Si . We conclude that T> needs to be applied at vertex u during 
iteration i + 2 only if u £ J\f(J\f(Si)). The number of such 
vertices u, in turn, is at most A 2 \Si\ = A 2 • a in. 

We now sum the values of A 2 (7jn over iterations i = ix + 
l,ir + 2, • • -,u: 



AV 



E - 

=i T + l 



/L("-1)/2J L("-2)/2j 

= A 2 n a 2 j+i + E G -- 2 

\ 3=»t/2 j=tr/2 

L("-i)/aj , e , 

< A 2 n ■ J2 1 + 1 ) > < 28 > 

where the last inequality is due to il Q . 

From d 1 5I > (and by neglecting a positive term), we obtain 

for even i > ir- Therefore, the expression in A28I is bounded 
from above by 

„2 \ *r 



A 2 n 



1 - 




where the inequality follows from Hit . 

Adding now the number of applications of the decoder T> 
during the first %t iterations, we conclude that the total number 
of applications of the decoder V is at most u> ■ n, where 



4 7 r 2 



□ 
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